Large language models

# Large language models

Upstage AI

Upstage AI leverages powerful large language models and document processing engines to transform workflows and improve efficiency for enterprises. Its main advantages include high accuracy, high performance, and customized solutions for various industries. It is positioned to empower leading enterprises and enhance work efficiency.

Document processing

ComfyGen

ComfyGen is an adaptive workflow system focused on text-to-image generation that automates and tailors effective workflows by learning from user prompts. The emergence of this technology marks a shift from using a single model to incorporating multiple specialized components in complex workflows aimed at enhancing image generation quality. A key advantage of ComfyGen is its ability to automatically adjust workflows based on user text prompts, making it especially valuable for users who need to generate images in specific styles or themes.

AI image generation

DCLM

DataComp-LM (DCLM) is a comprehensive framework for building and training large language models (LLMs), providing standardized corpora, efficient pre-training recipes based on the open_lm framework, and over 50 evaluation methods. DCLM supports researchers in experimenting with different data set construction strategies at different computational scales, from 411M to 7B parameter models. DCLM significantly improves model performance through optimized dataset design and has already facilitated the creation of multiple high-quality datasets that outperform all open datasets at different scales.

Make-An-Audio 2

Make An Audio 2

Make-An-Audio 2 is a text-to-audio generation technology based on diffusion models, co-developed by researchers from Zhejiang University, ByteDance, and the Chinese University of Hong Kong. This technology utilizes pre-trained large language models (LLMs) to parse text, optimizing for semantic alignment and temporal consistency, thereby improving the quality of generated audio. It also incorporates a feed-forward Transformer-based diffusion denoiser to enhance performance in generating variable-length audio and bolster the extraction of temporal information. Furthermore, by leveraging LLMs to convert abundant audio label data into audio-text datasets, the issue of time data scarcity is addressed.

AI Music Generation

BrainSoup

With BrainSoup, you can create custom AI agents to handle tasks and automate workflows through natural language. Enhance AI capabilities with your data while maintaining the highest levels of privacy and security. BrainSoup supports multiple large language models and semantic core technologies, making AI agents more powerful and personalized.

Intelligent Body

FP6-LLM

FP6-LLM is a new supporting solution for large language models. Through six-bit quantization (FP6), it effectively reduces the model size while maintaining model quality across various applications. We present TC-FPx, the first complete GPU kernel design that uniformly supports various quantization bit widths for floating-point weights. By integrating the TC-FPx kernel into existing inference systems, we provide a new end-to-end support for quantized LLM inference (called FP6-LLM), achieving a better balance between inference cost and model quality. Experiments demonstrate that FP6-LLM enables inference of LLaMA-70b using a single GPU, achieving normalized inference throughput 1.69x to 2.65x higher than the FP16 baseline.

StyleTTS 2

StyleTTS 2 is a text-to-speech (TTS) model that utilizes large speech language models (SLMs) for style diffusion and adversarial training, achieving human-level TTS synthesis. It employs a diffusion model to model style as a latent stochastic variable, generating the most appropriate style for the given text without relying on voice references. Furthermore, we utilize large pre-trained SLMs (such as WavLM) as discriminators and incorporate our innovative differentiable duration modeling for end-to-end training, enhancing the naturalness of the synthesized speech. StyleTTS 2 surpasses human recordings on the single-speaker LJSpeech dataset and matches them on the multi-speaker VCTK dataset, garnering recognition from native English-speaking evaluators. Additionally, when trained on the LibriTTS dataset, our model outperforms prior publicly available zero-shot extension models. By demonstrating the potential of style diffusion and adversarial training with large SLMs, this work achieves human-level TTS synthesis on both single and multi-speaker datasets.

AI speech synthesis

Featured AI Tools

Flow AI

Flow is an AI-driven movie-making tool designed for creators, utilizing Google DeepMind's advanced models to allow users to easily create excellent movie clips, scenes, and stories. The tool provides a seamless creative experience, supporting user-defined assets or generating content within Flow. In terms of pricing, the Google AI Pro and Google AI Ultra plans offer different functionalities suitable for various user needs.

Video Production

NoCode

NoCode is a platform that requires no programming experience, allowing users to quickly generate applications by describing their ideas in natural language, aiming to lower development barriers so more people can realize their ideas. The platform provides real-time previews and one-click deployment features, making it very suitable for non-technical users to turn their ideas into reality.

Development Platform

ListenHub

ListenHub is a lightweight AI podcast generation tool that supports both Chinese and English. Based on cutting-edge AI technology, it can quickly generate podcast content of interest to users. Its main advantages include natural dialogue and ultra-realistic voice effects, allowing users to enjoy high-quality auditory experiences anytime and anywhere. ListenHub not only improves the speed of content generation but also offers compatibility with mobile devices, making it convenient for users to use in different settings. The product is positioned as an efficient information acquisition tool, suitable for the needs of a wide range of listeners.

MiniMax Agent

MiniMax Agent is an intelligent AI companion that adopts the latest multimodal technology. The MCP multi-agent collaboration enables AI teams to efficiently solve complex problems. It provides features such as instant answers, visual analysis, and voice interaction, which can increase productivity by 10 times.

Multimodal technology

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0

Tencent Hunyuan Image 2.0 is Tencent's latest released AI image generation model, significantly improving generation speed and image quality. With a super-high compression ratio codec and new diffusion architecture, image generation speed can reach milliseconds, avoiding the waiting time of traditional generation. At the same time, the model improves the realism and detail representation of images through the combination of reinforcement learning algorithms and human aesthetic knowledge, suitable for professional users such as designers and creators.

Image Generation

OpenMemory MCP

OpenMemory is an open-source personal memory layer that provides private, portable memory management for large language models (LLMs). It ensures users have full control over their data, maintaining its security when building AI applications. This project supports Docker, Python, and Node.js, making it suitable for developers seeking personalized AI experiences. OpenMemory is particularly suited for users who wish to use AI without revealing personal information.

FastVLM

FastVLM is an efficient visual encoding model designed specifically for visual language models. It uses the innovative FastViTHD hybrid visual encoder to reduce the time required for encoding high-resolution images and the number of output tokens, resulting in excellent performance in both speed and accuracy. FastVLM is primarily positioned to provide developers with powerful visual language processing capabilities, applicable to various scenarios, particularly performing excellently on mobile devices that require rapid response.

Image Processing

LiblibAI

LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase